mlatoz

Machine Learning

A field of study that enables computers to learn and make predictions or decisions without being explicitly programmed.

Machine Learning Algorithms Cheatsheet

Algorithm	Description	When To Use	Applications	Advantages	Disadvantages
Data Preprocessing	Data preprocessing involves cleaning, transforming, and preparing raw data for machine learning algorithms.	Before training any machine learning model to enhance its performance and accuracy.	Data cleaning, feature scaling, handling missing values, encoding categorical variables.	Improves the quality of data, reduces errors, enhances model performance.	Time-consuming, requires domain knowledge, may lead to information loss.
Regression	Regression is a statistical method used for predicting the value of a dependent variable based on one or more independent variables.	When the target variable is continuous and you want to predict its value.	Sales forecasting, stock price prediction, demand estimation.	Provides insights into relationships between variables, easy to interpret.	Assumes linear relationship, sensitive to outliers.
Simple Linear Regression	Simple linear regression models the relationship between a single independent variable and a dependent variable using a linear function.	When there is a linear relationship between two variables.	Predicting house prices based on area, temperature, and time.	Simple and easy to understand, provides a baseline model.	Limited to linear relationships, may not capture complex patterns.
Multiple Linear Regression	Multiple linear regression models the relationship between multiple independent variables and a dependent variable using a linear function.	When there are multiple predictors influencing the target variable.	Predicting house prices using features like area, number of bedrooms, and location.	Incorporates multiple predictors, provides more accurate predictions.	Assumes linear relationship, sensitive to multicollinearity.
Polynomial Regression	Polynomial regression fits a polynomial curve to the data to capture non-linear relationships between variables.	When the relationship between variables is non-linear.	Modeling growth rates in biology, predicting stock prices with seasonal trends.	Can model complex relationships, flexible.	May overfit the data, requires careful selection of degree.
Support Vector Regression	Support vector regression is a regression algorithm that uses support vector machines to find the best-fitting hyperplane while minimizing prediction errors.	When dealing with small to medium-sized datasets with non-linear relationships.	Stock price prediction, energy consumption forecasting.	Effective in high-dimensional spaces, robust to overfitting.	Can be computationally expensive, requires tuning of parameters.
Decision Tree Regression	Decision tree regression builds a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.	When the relationship between features and target variable is non-linear and the data is structured.	Sales forecasting, predicting customer churn.	Easy to understand and interpret, handles both numerical and categorical data.	Prone to overfitting, sensitive to small variations in data.
Random Forest Regression	Random forest regression is an ensemble learning method that builds multiple decision trees and combines their predictions to improve accuracy and reduce overfitting.	When dealing with complex non-linear relationships and large datasets.	Predicting customer lifetime value, financial forecasting.	Reduces overfitting, handles high-dimensional data, robust to noise and outliers.	Less interpretable than individual decision trees, can be computationally expensive.
Classification	Classification is a supervised learning task where the goal is to categorize input data into predefined classes or categories.	When the target variable is categorical and you want to predict its class label.	Email spam detection, sentiment analysis, image recognition.	Provides clear insights into class boundaries, can handle both binary and multi-class problems.	Requires labeled data for training, sensitive to imbalanced classes.
Logistic Regression	Logistic regression is a statistical method used for binary classification problems. It models the probability that an instance belongs to a particular class.	For binary classification problems.	Credit risk analysis, medical diagnosis, customer churn prediction.	Outputs have a probabilistic interpretation, simple and efficient for small datasets.	Not suitable for complex relationships, requires feature engineering to avoid overfitting.
K-Nearest Neighbors (K-NN)	K-Nearest Neighbors is a non-parametric lazy learning algorithm that classifies instances based on the majority class among their k-nearest neighbors in feature space.	For classification and regression problems, especially when decision boundaries are not well-defined.	Handwriting recognition, recommendation systems, anomaly detection.	Simple and intuitive, no training phase, handles multi-class problems.	Computationally expensive during testing, sensitive to irrelevant features and outliers.
Support Vector Machine (SVM)	Support Vector Machine is a supervised learning algorithm used for classification and regression tasks. It finds the hyperplane that best separates classes in feature space.	For binary classification problems, especially when dealing with high-dimensional data.	Text classification, image recognition, bioinformatics.	Effective in high-dimensional spaces, memory efficient, versatile due to kernel trick.	Computationally expensive for large datasets, sensitive to noise and parameter tuning.
Kernel SVM	Kernel SVM is an extension of SVM that allows for non-linear decision boundaries by transforming the feature space using kernel functions.	When the data is not linearly separable.	Image recognition, bioinformatics, text classification.	Effective in high-dimensional spaces, handles non-linear relationships.	Choosing the right kernel function and its parameters can be challenging.
Naive Bayes	Naive Bayes is a probabilistic classifier based on Bayes' theorem with the assumption of independence between features.	When dealing with text classification or when the independence assumption holds.	Email spam filtering, document classification, sentiment analysis.	Simple and efficient, works well with high-dimensional data, handles missing values.	Assumes independence between features, can be outperformed by more complex models.
Decision Tree Classification	Decision tree classification builds a model that predicts the class label of an instance by following a series of decision rules inferred from the data features.	When the decision boundaries are non-linear and the data is structured.	Customer segmentation, medical diagnosis, credit scoring.	Easy to interpret, handles both numerical and categorical data.	Prone to overfitting, sensitive to small variations in data.
Random Forest Classification	Random forest classification is an ensemble learning method that builds multiple decision trees and combines their predictions to improve accuracy and reduce overfitting.	When dealing with complex non-linear relationships and large datasets.	Image classification, fraud detection, recommendation systems.	Reduces overfitting, handles high-dimensional data, robust to noise and outliers.	Less interpretable than individual decision trees, can be computationally expensive.
Clustering	Clustering is an unsupervised learning task where the goal is to group similar instances together into clusters.	When there is no predefined label or target variable, and you want to explore the structure of the data.	Customer segmentation, document clustering, anomaly detection.	Reveals hidden patterns and structures in data, does not require labeled data.	Choosing the right number of clusters can be subjective, sensitive to initialization.
K-Means Clustering	K-Means clustering partitions the data into k clusters by iteratively assigning instances to the nearest cluster centroid and updating centroids.	When the number of clusters is known or can be estimated, and clusters are spherical.	Market segmentation, image compression, anomaly detection.	Simple and computationally efficient, scales well to large datasets.	Requires specifying the number of clusters in advance, sensitive to initial cluster centroids.
Hierarchical Clustering	Hierarchical clustering builds a tree-like hierarchy of clusters by recursively merging or splitting clusters based on their similarity.	When the number of clusters is not known or when the data has a hierarchical structure.	Taxonomy creation, gene expression analysis, social network analysis.	Does not require specifying the number of clusters in advance, captures hierarchical relationships.	Computationally expensive, less scalable than K-Means.
Association Rule Learning	Association rule learning discovers interesting relationships between variables in large datasets by identifying frequent itemsets and deriving association rules.	When analyzing transactional data or market basket analysis.	Market basket analysis, recommendation systems, cross-selling strategies.	Reveals hidden patterns in data, interpretable rules.	Scalability issues with large datasets, sensitive to noise and sparsity.
Apriori	Apriori is a popular algorithm for mining frequent itemsets and generating association rules from transactional data.	When analyzing transactional data to discover frequent itemsets and association rules.	Market basket analysis, inventory management, website navigation analysis.	Scalable to large datasets, easy to implement.	Requires multiple passes over the data, computationally intensive for large itemsets.
Eclat	Eclat is an alternative algorithm to Apriori for mining frequent itemsets from transactional data using a depth-first search approach.	When scalability is a concern and dataset is sparse.	Market basket analysis, web usage mining, customer segmentation.	More memory efficient than Apriori, handles sparse datasets well.	Limited to mining frequent itemsets, does not generate association rules directly.
Reinforcement Learning	Reinforcement Learning is a type of machine learning where an agent learns to make decisions by trial and error, aiming to maximize cumulative rewards in a dynamic environment.	When the environment is not fully known and the agent needs to learn through interaction.	Game playing (e.g., AlphaGo), robotics, autonomous driving, recommendation systems.	Can handle complex, dynamic environments; capable of learning optimal strategies without supervision.	High computational requirements, exploration-exploitation trade-off, can be sensitive to hyperparameters.
Upper Confidence Bound (UCB)	UCB is an algorithm used in multi-armed bandit problems where the goal is to balance exploration and exploitation.	When dealing with decision-making under uncertainty with limited resources.	Online advertising, clinical trials, resource allocation.	Efficient exploration-exploitation trade-off, simple to implement.	May not perform optimally in all scenarios, assumes stationarity.
Thompson Sampling	Thompson Sampling is another approach to solve multi-armed bandit problems by using probability distributions over actions.	Similar to UCB, used in problems where exploration-exploitation trade-off is crucial.	Online advertising, clinical trials, resource allocation.	Incorporates uncertainty naturally, can adapt to changing environments.	Can be computationally intensive, requires prior knowledge.
Natural Language Processing	NLP involves the interaction between computers and humans through natural language.	When dealing with unstructured text data and tasks involving understanding, interpreting, and generating human language.	Sentiment analysis, language translation, chatbots, information retrieval.	Enables machines to understand and generate human language, facilitates communication between humans and machines.	Ambiguity in language, domain-specific challenges, data scarcity for certain languages or tasks.
Deep Learning	Deep Learning is a subset of machine learning where artificial neural networks with multiple layers learn representations of data.	When dealing with large, complex datasets and tasks that involve pattern recognition.	Image and speech recognition, natural language processing, autonomous driving.	Capable of learning complex patterns, automatically extracts features, scalable with large datasets.	Requires large amounts of data, computationally intensive, prone to overfitting.
Artificial Neural Networks (ANNs)	ANNs are computing systems inspired by the biological neural networks of animal brains.	When dealing with tasks like pattern recognition, classification, and regression.	Image recognition, speech recognition, financial forecasting.	Flexible architecture, capable of learning non-linear relationships.	Prone to overfitting, black box nature, requires large amounts of data for training.
Convolutional Neural Networks (CNNs)	CNNs are a type of deep neural network specifically designed for processing structured grid-like data.	When dealing with tasks involving image recognition, computer vision, and spatial data.	Object detection, image classification, medical image analysis.	Hierarchical feature learning, parameter sharing, translation invariance.	Requires large amounts of training data, computationally intensive.
Recurrent Neural Networks (RNNs)	RNNs are neural networks designed to work with sequential data by maintaining a state or memory.	When dealing with sequential data like time series, text, or speech.	Language modeling, machine translation, speech recognition.	Can handle variable-length sequences, captures temporal dependencies.	Vulnerable to vanishing/exploding gradient problem, difficulty capturing long-term dependencies.
Self-Organizing Maps (SOMs)	SOMs are a type of unsupervised learning neural network used for dimensionality reduction and visualization.	When visualizing high-dimensional data or discovering patterns in data.	Clustering, visualization of high-dimensional data.	Topological ordering, dimensionality reduction, visual representation of data.	Sensitivity to parameters, computationally expensive for large datasets.
Boltzmann Machines	Boltzmann Machines are stochastic generative models that learn probability distributions over binary-valued data.	When modeling complex data distributions or performing unsupervised learning tasks.	Dimensionality reduction, feature learning, collaborative filtering.	Capable of learning complex dependencies in data, unsupervised learning.	Training can be slow, difficult to scale to large datasets.
AutoEncoders	AutoEncoders are neural networks designed for unsupervised learning by learning to encode and decode data efficiently.	When performing tasks like data denoising, dimensionality reduction, or feature learning.	Anomaly detection, image denoising, recommendation systems.	Can learn compact representations of data, unsupervised learning.	Requires careful tuning of architecture and hyperparameters, sensitive to noise.
Dimensionality Reduction	Dimensionality Reduction techniques aim to reduce the number of random variables under consideration by obtaining a set of principal variables.	When dealing with high-dimensional data to simplify analysis and visualization.	Visualization, noise reduction, feature extraction.	Reduces computational complexity, removes redundant features, can improve model performance.	May lose some information, requires careful selection of the number of dimensions.
Principal Component Analysis (PCA)	PCA is a dimensionality reduction technique that identifies the directions (principal components) that maximize the variance in the data.	When dealing with high-dimensional data to reduce its dimensionality while preserving most of its variance.	Data visualization, noise reduction, feature extraction.	Reduces dimensionality while preserving information, removes correlated features.	Assumes linear relationships, may not perform well for non-linear data.
Linear Discriminant Analysis (LDA)	LDA is a dimensionality reduction technique used in classification tasks to find the feature subspace that maximizes class separability.	When performing classification tasks and reducing dimensionality.	Pattern recognition, feature extraction, classification.	Maximizes class separability, supervised dimensionality reduction.	Assumes normal distribution of data, sensitive to outliers.
Kernel PCA	Kernel PCA is a non-linear extension of PCA that uses kernel methods to project data into a higher-dimensional space before applying PCA.	When dealing with non-linear data structures and traditional PCA is not sufficient.	Non-linear dimensionality reduction, pattern recognition.	Handles non-linear relationships, captures complex structures in data.	Computational complexity increases with the size of the dataset, selection of appropriate kernel function is crucial.
Model Selection	Model selection involves choosing the best model from a set of candidate models based on some evaluation criterion.	When comparing multiple models to determine which one performs best for a given task.	Machine learning, statistics, optimization.	Improves generalization performance, selects the most suitable model for the problem.	Requires careful evaluation metrics, can be computationally expensive.
k-Fold Cross Validation	k-Fold Cross Validation is a technique used to estimate the performance of machine learning models by splitting the data into k subsets and training/testing the model k times.	When evaluating the performance of a model and estimating its generalization error.	Model evaluation, hyperparameter tuning.	Provides more reliable performance estimates, reduces bias in performance evaluation.	Can be computationally expensive, may introduce randomness.
Grid Search	Grid Search is a technique used for hyperparameter optimization, where a grid of hyperparameter values is specified, and the best combination is selected based on model performance.	When tuning hyperparameters of machine learning models.	Hyperparameter optimization, model selection.	Systematic approach to hyperparameter tuning, exhaustive search.	Computationally expensive, may not scale well with high-dimensional hyperparameter spaces.
Boosting	Boosting is an ensemble learning technique that combines multiple weak learners (simple models) sequentially to build a strong learner.	Boosting is particularly useful when you have a large dataset and want to improve the performance of weak models.	Classification and regression tasks in various domains such as finance, healthcare, marketing, and e-commerce.	High predictive accuracy, robustness to overfitting, versatility, and handles imbalanced data well.	Computationally expensive, sensitive to noisy data, requires careful parameter tuning, and less interpretable.
XGBoost	XGBoost is an implementation of gradient boosting machines, a popular ensemble learning technique that builds a series of weak learners and combines them to make predictions.	When dealing with structured/tabular data and aiming for high predictive accuracy.	Regression, classification, ranking.	High predictive accuracy, handles missing data, regularization.	Requires careful tuning of hyperparameters, can be computationally expensive.

«Previous